Improving Storage System Reliability with Proactive Error Prediction
نویسندگان
چکیده
This paper proposes the use of machine learning techniques to make storage systems more reliable in the face of sector errors. Sector errors are partial drive failures, where individual sectors on a drive become unavailable, and occur at a high rate in both hard disk drives and solid state drives. The data in the affected sectors can only be recovered through redundancy in the system (e.g. another drive in the same RAID) and is lost if the error is encountered while the system operates in degraded mode, e.g. during RAID reconstruction. In this paper, we explore a range of different machine learning techniques and show that sector errors can be predicted ahead of time with high accuracy. Prediction is robust, even when only little training data or only training data for a different drive model is available. We also discuss a number of possible use cases for improving storage system reliability through the use of sector error predictors. We evaluate one such use case in detail: We show that the mean time to detecting errors (and hence the window of vulnerability to data loss) can be greatly reduced by adapting the speed of a scrubber based on error predictions.
منابع مشابه
Proactive error prediction to improve storage system reliability
This paper proposes the use of machine learning techniques to make storage systems more reliable in the face of sector errors. Sector errors are partial drive failures, where individual sectors on a drive become unavailable, and occur at a high rate in both hard disk drives and solid state drives. The data in the affected sectors can only be recovered through redundancy in the system (e.g. anot...
متن کاملGrey Prediction Model for Forecasting Electricity consumption
Accurate prediction of the future electricity consumption is crucial for production electricity management. Since the storage of electrical energy is very difficult, reliable and accurate prediction of power consumption is important. Different approaches for this purpose were used. In this paper, Grey model (1,1) based on grey system theory has been used for forecasting results. Annual electric...
متن کاملArchitecting Dependable Systems with Proactive Fault Management
Management of an ever-growing complexity of computing systems is an everlasting challenge for computer system engineers. We argue that we need to resort to predictive technologies in order to harness the system’s complexity and transform a vision of proactive system and failure management into reality. We describe proactive fault management, provide an overview and taxonomy for online failure p...
متن کاملAn Evolutionary Method for Improving the Reliability of Safetycritical Robots against Soft Errors
Nowadays, Robots account for most part of our lives in such a way that it is impossible for usto do many of affairs without them. Increasingly, the application of robots is developing fastand their functions become more sensitive and complex. One of the important requirements ofRobot use is a reliable software operation. For enhancement of reliability, it is a necessity todesign the fault toler...
متن کاملPrediction of fireball consequences caused by Boilover occurrence in the atmospheric storage tanks
Background and Objectives: Although Boilover occurs with a low frequency, but in case of occurrence, it can cause severe damage to people and equipment around the tank. The prediction of the fireball of Boilover phenomenon has an important role to play in adopting appropriate strategies for fire suppression of the atmospheric storage tank. The purpose of this study is to predict the consequence...
متن کامل